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ABSTRACT 


In this competitive scenario of the educational system, the higher education 
institutes use data mining tools and techniques for academic improvement of 
the student performance and to prevent drop out. The authors collected data 
from three colleges of Assam, India. The data consists of socio-economic, 
demographic as well as academic information of three hundred students with 
twenty-four attributes. Four classification methods, the J48, PART, Random 
Forest and Bayes Network Classifiers were used. The data mining tool used 
was WEKA. The high influential attributes were selected using the tool. The 
internal assessment attribute in the continuous evaluation process makes the 
highest impact in the final semester results of the students in our dataset. The 
results showed that random forest outperforms the other classifiers based on 
accuracy and classifier errors. Apriori algorithm was also used to find the 
association rule mining among all the attributes and the best rules were also 


performance displayed. 
Copyright © 2018 Institute of Advanced Engineering and Science. 
All rights reserved. 
Corresponding Author: 
Sadiq Hussain, 


Examination Branch, Dibrugarh University, India 
Email: sadig @ dibru.ac.in 


1, INTRODUCTION 

Data Mining (DM) is one of the active fields in the Computer Sciences (CSs). It is a young and 
promising field. Due to the extensivity and the huge availability of the amounts of data and the urgent need to 
convert such data into useful information and knowledge, Data mining has enticed a great importance of 
interest in the information industry and in society as well in recent years [1]. DM focuses on the extraction of 
hidden knowledge from various data warehouses, data marts, and repositories. Large data becomes useless 
without proper utilization. 

Sometimes DM can be named also Knowledge data discovery (KDD). They are similar in many 
things but they are really different in an essential point. DM is to find a subset Di of D that met a logical 
formula within the scope of Di reduced matrix. If DM cannot deduced any results from that logical formula, 
KDD will be found, in contrast, even if that logical formula can cover all the data as well as the possibility of 
the knowledge discovery. The main feature of both data mining and knowledge discovery is to derive 
common expressions of characteristics that are shared by all elements in a set [2]. KDD and DM have 
techniques that are used to extract useful information from large amount of data in the database [3, 4]. The 
results of applying the DM algorithms on any given or manual-generated dataset can be named the Rule 
Discovery [5]. There are two main types of these rules, the production rules and the association rules. 
According to Quinlan [6], the production rules are a common formalism for expressing knowledge in expert 
systems. Decision Trees rules can be also transformed into the production rules [6]. The association rules was 
firstly addressed to find a relationship among sales of different items from the analysis of a big data [7].There 
are many fields that DM has been applied in, One of them is the educational DM (EDM). 
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Educational data mining is an emerging field in the area of data mining. In this competitive world, 
the educational setting also uses data mining tools to explore and analyze student performance, predict their 
results to prevent drop out and focus on both good and academically poor performers, feedback for the 
faculties and instructors, visualization of data and to have a better assessment of learning process. The quality 
of education needs to be improved and educational data mining is a tool for this improvement. Modern 
educational institutes need data mining for their strategy and future plans. Student’s performance depends on 
various factors like personal, social, economic and other environmental ones[8, 9]. The top-level educational 
institutes’ authorities may utilize the outcome of the experimental results to understand the trends and 
behaviors in students’ performance which may lead to design new pedagogical strategies [10]. 

There are a number of classification algorithms: Decision Tree, Neural Network, Naive Bayes, K- 
Nearest neighbor, Random Forest, AdaBoost, Support Vector Machines etc. [11]. In this research, authors are 
going to use notably some of them for mining the academic students’ performance: J48, BayesNet, PART 
and Random Forest classification algorithms. Apriori algorithm, as a part of the unsupervised learning and 
one of the most popular algorithms for association rule mining was used additionally to reveal the hidden 
rules from our dataset [12]. They compared each of the algorithms based on its accuracy to select the best 
performed algorithm for the job. 

Classification is one of the predictive tasks [1] and is the most commonly used data mining 
technique in predicting the students’ performance in educational institutes [11, 13, 14]. Several attributes 
were considered in our study. To find the high influence attributes, feature selection was conducted first. 
Feature selection removes the unnecessary attributes from the dataset to extract useful and meaningful 
information. It makes the mining process faster, valuable and meaningful. In the study, students’ end 
semester percentage is selected as the dependent parameter. The percentages are categorized as ‘Best’, “Very 
Good’, ‘Good’, ‘Pass’, ‘Fail’. The data mining tool used for the study was WEKA (Waikato Environment for 
Knowledge Analysis). WEKA is an open source tool written in Java that is widely used by the data miners 
[15]. WEKA implements most of the machine learning algorithms and visualizes its results as well. 

The paper is organized as follows: in Section II a review of related literature is presented, Section III 
introduces Classifier evaluations and Error Measurement Techniques used in this research. Section IV 
provides Applied Data mining algorithms on the selected dataset. Section V showed experimental results, 
Section VI presents the Association rule mining work, and section VII concludes the work. 


2. LITERATURE REVIEW 

Ahmad et al [16] designed a framework to predict the academic performance of the first year 
bachelor students of computer science course. The dataset contained 8 years data starting from July 2006-07 
to July 2013-14. The data collected contained various aspects of students' records including previous 
academic records, family background and demographics. Three classifiers viz. Decision Tree, Naive Bayes 
and Rule-Based classifiers are applied to find the academic performance of students. The experiments 
showed that Rule Based classifier was the best among the other classifiers and its accuracy was found as 
71.3%. The first year students’ level of success was predicted by the model. Sumitha et. al. [17] developed a 
data model to predict student’s future learning outcomes using senior students dataset. They compared the 
data mining classification algorithms and found that J48 algorithm was best suited for such job based on their 
data. 

Khasanah et. al. [18] conducted a study to find that high influence attributes may be selected 
carefully to predict student performance. Feature selection may be used before classification for such job. 
The student data was from Department of Industrial Engineering Universitas Islam Indonesia. They used 
Bayesian Network and Decision Tree algorithms for classification and prediction of student performance. 
The Feature Selection methods showed that student’s attendance and Grade Point Average in the first 
semester topped the list of features. When the accuracy rate was considered, the Bayesian Network 
outperformed the Decision Tree classification in their case. Ankita A Nichat et. al. [19] built classification 
models using decision tree and artificial neural network techniques. They used several attributes to access the 
strength and weakness of the students to improve the performance of the students. 

Hilal Almarabeh [20] used WEKA tool to evaluate the performance of the university students. He 
found that the accuracy of the classifier algorithms depends upon size and nature of data. The author used 
Naive Bayes, Bayesian Network, Neural Network, ID3 and J48 classification techniques. It was found that 
Bayesian Network outperforms the others in terms of accuracy. Amjad Abu Saa [21] worked out a qualitative 
model to analyze the student performance based on students’ personal and social factors. The author explored 
theoretically various factors of the students’ performance in the field of higher education. 

Pedro Strecht et. al. [10] predicted students' results (pass/fail) and their grades in their work. They 
used classification model for the students' results and a regression model for the prediction of the grades. 
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They carried out the experiments using the 700 courses students’ data who studied at the University of Porto. 
They used decision trees and SVM for classification while SVM, Random Forest, and AdaBoost.R2 were 
best suited for regression analysis. The classification model was able to extract useful patterns, but the 
models for regression were not able to beat a simple baseline. Fahim Sikder et. al. [13] used Cumulative 
Grade Point Average (CGPA) for prediction of students’ yearly performance. The dataset used was from 
Bangabandhu Sheikh Mujibur Rahman Science and Technology University students’ records. The authors 
used neural network technique for prediction and it was compared with the real CGPA of the student. 


2.1 Classifier Evaluations and error measurement techniques: 

The performance measures are derived from confusion matrix [22]. A confusion matrix is formed 
based on the four outcomes of binary classification. In binary classification, the dataset usually has two labels 
positive (P) and negative (N). The outcomes are true positive (TP) i.e. correct positive prediction, true 
negative (TN) 1.e. correct negative prediction, false positive (FP) 1.e. incorrect positive prediction and false 
negative (FN) 1.e. incorrect negative prediction. 


a. Sensitivity (Recall or True positive rate) 
Recall is the number of correct classifications divided by the total number of positives. So, 


R= TP/ (TP + FN) = TP/P (1) 
b. Precision 
Precision is the number of correct positive classifications divided by total number of positive classifications. 
SO, 

P =TP/ (TP + FP) (2) 


c. F-score 
F-score is harmonic mean of precision and recall. So, 


F= 2PR / (P+R) (3) 


d. Accuracy [23] 
Accuracy is the number of all correct classifications divided by the total numbers of cases. So, 


Accuracy = (TP+TN) / (TP+TN+FN+FP) = (TP+TN) / (P+N) (4) 
The following section explains different error measures used for classification methods. 


e. Mean Absolute Error (MAE) [24] 
MAE estimates how far the predictions or forecasts differ from the actual values. 





MAE = = > | ac, xc | 
(5) 
where n = the number of errors, Ixii — x! = the absolute errors. 
f. Root Mean Square Error (RMSE) [24] 
RMSE is an evaluator of the differences between the predictor values and the actual observed values. 
> (4, oe fen, ; 
RMSE =} Quinn Kot = (6) 


i 


where Xobs is observed values and Xmodel is modeled values at time/place 1. 
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g. Relative Absolute Error (RAE) [20] [15] 
RAE is defined as the ratio of absolute error by the magnitude of the actual value. It is represented as below, 


fi] 
2, Pi — 4 


ya-a| (7) 


iat 











where pi is the forecast value, ai is the actual value and @ is the average of actual values. 

h. Root Relative Squared Error (RRSE) [20] [15] 
It is denoted as mean absolute error (MAE) divided by the classification model error. It can be 
represented as below, 


RRSE = 





(8) 


Avoiding bias in the algorithms selection: 

There were many studies for accessing the student academic performance and prediction of drop out 
of students and their job prospects [25]. The goal of such type of study was to improve the quality of 
education in higher educational institutes. Most of the studies consider the grade point averages (GPA) [26, 
27], as their response variable and the explanatory variables are varied. In our study, we had used final 
semester percentage as our response variable as the grading system are not yet introduced at undergraduate 
level in most of the courses in Assam. 

There were also various classification methods applied for student academic performance studies 
[16, 20]. The different studies showed that on their dataset the results found on accuracy varies. Some of the 
studies found that the decision trees are the best among other classification algorithms whereas some found 
that Bayes Network performed better than others. 

The authors had applied four of the classification methods one by one until the accuracy found to be 
99% in case of random forest. The first method used by the authors was Bayesian Network (BN). According 
to Almarabeh [20] had analyzed the performance of students’ of King Saud Bin Abdulaziz University for 
Health Sciences. He found that BN was the best-suited classification methods. Directed acyclic graphs are 
used in Bayesian networks to depict the dependencies among random variables. Random variables are 
represented as nodes. If the nodes are connected by an arc, then these variables are dependent on each other. 
BN has been used for performing bi-directional inference since 1980. It is also used for reasoning under 
uncertainty. 

The authors then tried the rule-based classification techniques available as PART in WEKA. Ahmad 
et al [16] also used this technique for classification and found that it was the best technique for student 
academic performance assessment among Naive Bayes, decision trees, and rule-based classifiers. PART is 
rules-based classifier which combines separate and conquer method with divide and conquer strategy. This 
classification method builds a partial tree with the available set of records. It then creates a rule from the tree. 
After discarding the decision tree and deleting records covered by the rule, it again builds the partial decision 
tree in an iterative manner. 

The authors then used the decision tree classification method. Patil et al [28] established that 
decision tree algorithm performs better than Naive Bayes methods. The advantage of using decision tree 
classifier is that the tree can be visualized, understood and interpreted easily by the users [29]. The tree 
performs well in case of both numerical and categorical variables. The decision tree has a tree-like structure 
start with root node and ends with leaf attributes. So, it is one of the powerful as well as popular classifiers. 
WEKA implements C4.5 decision tree using J48 classification method. 

The authors used random forest classifier as their next attempt. Random forests (RF) [11] reduce 
overfitting, bias, and variance. So, RF is more accurate and robust. RF works on bagging algorithm. RF 
replaces data to construct the tree and the partition is not done on the same important variable as the 
explanatory variables are bootstrapped. RF creates lots of individual decision trees from the training set. It is 
good at predicting the target values. 
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4. APPLYING DATA MINING ALGORITHMS TO THE SELECTED DATASET 
The dataset contained 300 instances with 24 attributes. The proposed framework is shown in Figure 
1 below. 


4.1. Data Preprocessing phase 

The data for this research was collected from three different colleges, those are Duliajan College, 
Doomdooma College and Digboi College of Assam, India. Initially, data of twenty-four attributes were 
collected. As the attribute name of the student does not carry any significance, we removed it from the list of 
the attributes. The attribute "marks in practical paper" was also removed at the pre-processing phase, because 
of the interesting number of the missing values. Finally, twenty-two attributes were selected after data 
cleaning. Table-1 shows the selected attributes with their possible values. 


Data Collection from different 
colleges 
Creation of arff fie based on the 
collected data 
Applying WEKA Tool to the data 


Feature Selection to find m ostrelevant 
attnbutes for classification 


Traiming Data Set Test Data Set 


Classification 
Algonthm s 


Result Evaluation and Visualization 


Figure 1: Framework for Students’ Academic Performance Classification 
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Table 1: Dataset Description 


Attribute Description Values 
GE Gender (Male, Female) 
CST Caste (General,SC,ST,OBC,MOBC) 
TNP Class X Percentage (Best, Very Good, Good, Pass, Fail) 
If percentage >=80 then Best 
If percentage >= 60 but less than 80 then Very Good 
If percentage >= 45 but less than 60 then Good 
If Percentage >= 30 but less than 45 then Pass 
If Percentage < 30 then Fail 
TWP Class XII Percentage (Best, Very Good, Good, Pass, Fail) 
Same as TNP 
IAP Internal Assessment Percentage (Best, Very Good, Good, Pass, Fail) 
Same as TNP 
ESP End Semester Percentage (Best, Very Good, Good, Pass, Fail) 
Same as TNP 
ARR Whether the student has back or arrear (Yes, No) 
papers 
MS Marital Status (Married, Unmarried) 
LS Lived in Town or Village (Town, Village) 
AS Admission Category (Free, Paid) 
FMI Family Monthly Income (Very High, High, Above Medium, Medium, Low) 


(in INR) If FMI >= 30000 then Very High 
If FMI >= 20000 but less than 30000 then High 
If FMI >= 10000 but less than 20000 then Above Medium 
If FMI >= 5000 but less than 10000 then Medium 
If FMT is less than 5000 then Low 
The figures are expressed in INR. 


FS Family Size (Large, Average, Small) 
If FS > 12 then Large 
If FS >= 6 but less than 12 then Average 
If FS < 6 then Small 
FQ Father Qualification UL, UM, 10, 12 , Degree, PG ) 


IL= Illiterate UM= Under Class X 


MQ Mother Qualification UL, UM, 10, 12 , Degree, PG ) 
IL= Illiterate UM= Under Class X 
FO Father Occupation (Service, Business, Retired, Farmer, Others) 
MO Mother Occupation (Service, Business, Retired, Farmer, Others) 
NF Number of Friends (Large, Average, Small) 
Same as Family Size 
SH Study Hours (Good, Average, Poor) 
>= 6 hours Good >= 4 hours Average < 2 hours Poor 
SS Student School attended at Class X ( Govt., Private) 
level 
ME Medium (Eng,Asm,Hin,Ben) 
TT Home to College Travel Time ( Large, Average, Small ) 
>= 2 hours Large >=1 hours Average < 1 hour Small 
ATD Class Attendance Percentage (Good, Average, Poor) 


If percentage >= 80 then Good 
If percentage >= 60 but less than 80 then Average 
If Percentage < 60 then poor 


Descriptions of some of the attributes of the dataset 


CST: It is caste of the student. The possible values of this attribute are ‘G’ (General category or unreserved 
category), “SC’ (Schedule Caste category), ‘ST’ (Schedule Tribe Category), ‘OBC’ (Other Backward 
Classes), ‘“MOBC’ (Minorities and other backward classes) students. These categories are based on the Indian 
Constitution. 

TNP: It is the percentage attained by the student in Class X. The examination is called HSLC Examination in 
Assam, India. The authors had categorized the results as Best, Very Good, Good, Pass, Fail. The “Best' is 
called when the student secured more than or equal to 80% (it is termed as Star percentage), “Very Good' is 
labeled as when the student secures more than or equal to 60% but less than 80% (more than or equal to 60% 
is always termed as First Division or Class in most of the examinations), “‘Good' is termed as when the 
student secures more than or equal to 45% but less than 60% (in most of the Universities in Assam it is called 
as Second Division or class), ‘Pass’ is called when the student got less than or equal to 30% but less than 
45%. It is termed as ‘Fail’ when the student secured less than 30%. The same is true for TWP (Class XI 
percentage secured by the student), [AP (Internal Assessment percentage secured by the student at Degree 
level (10+2+3)) and ESP (End Semester Examination percentage secured by the student at Degree level). 
ESP is the response variable. 
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IAP (Internal Assessment percentage secured by the student at Degree level (10+2+3)): Internal Assessment 
is part of continuous evaluation. It comprises of sessional examinations, surprise tests, assignments, field 
work, quizzes etc. It is categorized as the same way as TNP,TWP and ESP. 

ARR: It is categorized as ‘Yes’ or ‘No’. This attribute collected the data based on the fact that whether the 
student had any failed paper in any of the previous semesters. 

ME: It is categorized as ‘Eng’ (English), “Asm’ (Assamese), ‘Hin’ (Hindi) and Ben (‘Bengali’). Assamese, 
Hindi and Bengali are the modern Indian languages. It is the language or medium of instructions for the 
students in which languages they were being taught or appeared in an examination. 

FQ: The possible values of this attribute are ‘II’ (illiterate), ‘Um’ (Under class X level), ‘10’ (Passed Class X 
Examination), ‘12’ (Passed Class XII Examination), ‘Degree’ (Passed Bachelor of Arts or Science or 
Commerce Examination), ‘PG’ (passed Masters of Arts or Science or Commerce Examination). It is the 
educational qualification of father of student. MQ stands for mother qualification. The possible values of this 
attribute are same as father qualification. 


4.2 Feature Selection 

Using Weka, the feature selection discovers the most influential attributes using correlation-based 
attribute evaluation, gain-ratio attribute evaluation, information-gain attribute evaluation, relief attribute 
evaluation, symmetrical uncertainty attribute evaluation. Correlation-based attribute evaluation is a greedy 
search method while others are rank search methods [18]. 

Using these feature selection methods, total eleven attributes were found to be highly influential. 
The selected attributes are shown as bold in Table 2. They were used for classification and other attributes 
were removed. The end semester percentage (esp) is the response variable. Figure 2 shows the data in the arff 
format. 


Table 2: Attribute Selection using feature selection methods 
Feature Selection Method High Influence Attributes 


Correlation-based Attribute Evaluation 
Gain-Ratio Attribute Evaluation 
Information-Gain Attribute Evaluation 
Relief Attribute Evaluation 
Symmetrical Uncertainty Attribute 


@RELATION SapTilel 


@ATTRIBUTE cst 16,57, 5C,08C,MOBC} 
@ATTRIBUTE top {Best,¥g,Good, Pass ,Fail} 
@ATTRIBUTE twp {Best,¥g,Good, Pass Fail} 
@ATTRIEBUTE lap {Best ,Vg,Good, Pass ,Fai |} 
@ATTRIBUTE ie ere 
@ATTRIBUTE arr {¥,N} 

@ATTRIBUTE ms {Married,Unmarried} 
@ATTRIBUTE 15 {T,¥} 


@ATTRIBUTE ge fst 


@ATTRIBUTE as {Free, Paid} 
@ATTRIBUTE fmi {Vh,High,Am,Medium,Low} 
@ATTRIBUTE fs {Large, Average, Small} 


@ATTRIBUTE fq {I1],Um,10,12, Degree ,Pq} 
@ATTRIBUTE mq {I],Um,10,12 Degree, Pg} 


air, lap,tnp,as,twp,sh,me,fs,nf, atd,fo,fmi,fq,tt,ss 
lap,ms, arr, tnp,twp,as,me,sh,atd,fmi,fq,nf,fo,mq,fs 
iap,tnp,twp, arr,fmi,as,fq,me,atd,sh,fo,mq,nf,cst,tt 
iap,tnp,arr,tnp,nf,as,atd,me,fo,sh,fmi,fs, ls, ge,tt 
iap,tnp,twp,arr,as,me,fmi,atd,sh,fq,fo,mq,nf,fs,tt 


@ATTRIBUTE Tf ee ee ee 


@ATTRIEUTE mo 
@ATTRIBUTE nf {Large,Average, Small} 
@ATTRIBUTE sh {Good, Average, Poor} 
@ATTRIBUTE ss {Govt, Private} 
@ATTRIBUTE me ot Lgegpalllleigels 
@ATTRIBUTE tt {Large,Average, Small} 
@ATTRIBUTE atd {Good, Average, Poor} 





@DATA 


Service Business ,Retired,Housewife,others} 


F,6,Good,Good,Vqg,Good,¥,Unmarried,V,Paid,Medium, Average, Um,10,Farmer Housewife ,Large, Poor ,Govt,,. 


Figure 2: Data File in arff format 
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4.3 Specifying the selected algorithms 

After feature selection, the classification algorithms were applied. There are various classification 
methods: Decision Tree, Neural Network, Naive Bayes, K-Nearest neighbor, Random Forest, AdaBoost, 
Support Vector Machines etc. [13]. The authors used specific algorithms, for mining the academic 
performance of the students, those are found in the WEKA program: J48, PART, BayesNet and Random 
Forest classification algorithms. According to the WEKA algorithms specification [30]: J48 is an algorithm 
that generates a pruned or unpruned C4.5 decision tree. PART is an algorithm that uses divide-and-conquer 
mechanism to build a partial C4.5 decision tree in each iteration, i.e. it generates a PART decision list, and 
makes the best leaf into a rule. BayesNet produces random instances based on a Bayes network that uses 
various search algorithms and quality measures. It also offers data structures (network structure, conditional 
probability distributions, etc.) and facilities public to Bayes Network learning algorithms. Random Forest is 
a group of unpruned classification or regression trees that are created using bootstrap examples of the training 
data and random feature selection in tree induction that is finally constructing a forest of random trees [30, 
31]. Then the authors compared each of the algorithms based on its accuracy to select the best-performed 
algorithm for the job. 


5. EXPERIMENTS AND RESULTS 
5.1 Classification Results: 

The stage is set for the experiments. WEKA has various classification algorithms. The authors had 
used J48, BayesNet, PART and Random Forest classification methods available in WEKA. These methods 
are supervised learning algorithms which use the training data to test the correctness of testing data [20]. 
Figure 4 shows the comparison between these four classifiers. 


J48 Classifier: This classifier is used for generating decision tree based on C4.5 algorithm. Ross Quinlan 
developed this algorithm [20]. Its performance is shown in figure 6. 

BayesNet Classifier: This classifier delivers higher accuracy on large database. It also makes the 
computational time less than better speed. Bayesian Network uses conditional dependencies using direct 
graph [20]. 

Random Forest Classifier: This classifier used bootstrap sampling method on the training dataset to 
construct many unpruned classification trees. In the testing phase, the mean of all unpruned classification 
trees for a randomly selected feature provides the final predicted output [32]. Its performance is shown in 
figure 7 and 8. 

PART Classifier: This rue learning classifier combines the divide-and-conquer strategy with separate-and 
conquers strategy. It builds a partial decision tree on the current set of instances and creates a rule from the 
decision tree [33]. 

There are 300 student records from three different colleges with 12 selected attributes. Table 3 shows the 
performance of the 4 classification methods based on their accuracy. 


Table 3: Comparison of different classifiers based on accuracy. 


Classifiers Accuracy Correctly Classified Instances Incorrectly Classified Instances 
Random Forest 99% 297 3 
PART 74.33% 223 77 
J48 73% 219 81 
BayesNet 65.33% 196 104 


Based on the accuracy of the four classifiers, the Random Forest has more correctly classified instances than 
other classification methods. Its accuracy percentage is 99%. Figure 4 and 5 shows that the Random Forest 
Classifier has the minimum errors in terms of Mean Absolute Error (MAE), Root Mean Square Error 
(RMSE), Relative Absolute Error (RAE) and Root Relative Squared Error (RRSE) when compared with 
other methods. 
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Comparison of Different Classification Models 
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Figure 6. J48 Tree Visualization 


The Kappa statistic value is 0.9859 which shows that the model is statistically significant. 
The significance is rather high according to this value. So, this model may be used for the prediction of final 
semester percentage of the student. 
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The authors had also compared the random forest classifier with feature selection and without 
feature selection. The random forest classifier with feature selection outperforms the other. Table 4 shows the 
comparison. 
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Figure 7: Random Forest Visualization of Cost Curve of ‘Best’ Class of ‘end semester percentage’ attribute 


Table 4: Comparison of Random Forest Classifier with and without selected attributes 


Classifiers Accuracy Correctly Classified Instances Incorrectly Classified Instances 
Random Forest 99% 297) S 
With 12 selected attributes 
Random Forest 84.33% 253 67 


With all the attributes 
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Figure 8. Random Forest Visualization of Cost/Benefit Analysis for “Good’ Class of ‘end semester 
percentage’ attribute 
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5.1 Association Rule results 
Association rules are used for analyzing the data to uncover the frequent if/then patterns. The most 


important relationships are identified by using support and confidence criteria. The association rule 
comprises of two parts. They are antecedent (if part) and a consequent (then part). The Apriori algorithm is 
most frequently used algorithm to find the correlation based data mining works [12]. Using WEKA, we had 
applied the Apriori algorithm on our datasets. The Minimum support was 0.6 (180 instances), minimum 
metric (confidence) was 0.9 and number of cycles performed were 8. We had found the best rules as shown 
below: 


1. lsx=V 240 ==> ms=Unmarried 240 <conf:(1)> lift:(1) lev:(0) [0] conv:(0.8) 

2. ls=V mo=Housewife 213 ==> ms=Unmarried 213 <conf:(1)> lift:(1) lev:(0) [0] conv:(0.71) 

3. fs=Small 202 ==> ms=Unmarried 202 <conf:(1)> lift:(1) lev:(0) [0] conv:(0.67) 

4. as=Free 191 ==> ms=Unmarried 191 <conf:(1)> lift:(1) lev:(O) [0] conv:(0.64) 

5. fs=Small mo=Housewife 182 ==> ms=Unmarried 182 <conf:(1)> lift:(1) lev:(0) [0] conv:(0.61) 

6. ls=V ss=Govt 181 ==> ms=Unmarried 181 <conf:(1)> lift:(1) lev:(0) [0] conv:(0.6) 

7. mo=Housewife 269 ==> ms=Unmarried 268 <conf:(1)> lift:(1) lev:(-O) [0] conv:(0.45) 

8. ss=Govt 221 ==> ms=Unmarried 220 <conf:(1)> lift:(1) lev:(-0) [0] conv:(0.37) 

9. mo=Housewife ss=Govt 200 ==> ms=Unmarried 199 <conf:(0.99)> lift:(1) lev:(-0) [0] conv:(0.33) 
0. me=Asm 193 ==> ms=Unmarried 192 <conf:(0.99)> lift:(1) lev:(-0) [0] conv:(0.32) 


— 


The experiment was again performed with the selected attributes. This time the Minimum support 
was 0.1 (30 instances), minimum metric (confidence) was 0.9 and number of cycles performed were 18. The 
authors had found the best rules as shown below: 


1. esp=Fail 32 ==> arr=Y 32 <conf:(1)> lift:(1.97) lev:(0.05) [21] conv:(15.79) 

2. fmi=Low fo=Farmer me=Asm 32 ==> as=Free 32 <conf:(1)> lift:(1.57) lev:(0.04) [15] conv:(11.63) 

3. arr=Y fo=Farmer nf=Average 31 ==> as=Free 31 <conf:(1)> lift:(1.57) lev:(0.04) [15] conv:(11.26) 

4. twp=Good iap=Good arr=Y fo=Farmer 32 ==> me=Asm 31 ~~ <conf:(0.97)> lift:(1.51) lev:(0.03) [33] 
conv:(5.71) 

5. tnp=Good fmi=Low me=Asm 31 ==> as=Free 30 <conf:(0.97)> lift:(1.52) lev:(0.03) [33] conv:(5.63) 

6. arr=Y nf=Average me=Asm 50 ==> as=Free 48 <conf:(0.96)> lift:(1.51) lev:(0.05) [13] conv:(6.06) 

7. twp=Good 1ap=Good as=Free fo=Farmer 44 ==> me=Asm 42 = <conf:(0.95)> lift:(1.48) lev:(0.05) [14] 
conv:(5.23) 

8. fmi=Low fo=Farmer 43 ==> as=Free 41 <conf:(0.95)> lift:(1.5) lev:(0.05) [14] conv:(5.21) 

9. iap=Good arr=Y fo=Farmer 43 ==> as=Free 41 <conf:(0.95)> lift:(1.5) lev:(0.05) [14] conv:(5.21) 

10. 1ap=Good arr=Y fo=Farmer me=Asm 40 ==> as=Free 38 <conf:(0.95)> lift:(1.49) lev:(0.04) [18] 
conv:(4.84) 


6. CONCLUSION AND FUTURE WORK 

The students’ academic performance was evaluated based on academic and personal data collected 
from 3 different colleges from Assam, India. The total number of records were 300 with 24 attributes. Two 
attributes were dropped in the phase of data cleaning. Using feature selection, 12 highly influential attributes 
were selected. After that four different classification algorithms were used. They were J48, PART, BayesNet 
and Random Forest. The data mining tool used in the experiment was WEKA 3.8. Based on the accuracy and 
the classification errors one may conclude that the Random Forest Classification method was the most suited 
algorithm for the dataset. The Apriori algorithm was applied to the dataset using WEKA to find some of the 
best rules. The data may be extended to collect some of the extra-curricular aspects and technical skills of the 
students and mined with different classification algorithms to predict the student performance as future work. 
The authors also interested in working in future on data of students assessments for each course trying to 
know what kind of student succeed on what kind of courses. It may define what kinds of courses are adapted 
for every cluster of students model who shares the same characteristics. It may also provide various 
multidimensional summary reports and redefine pedagogical learning paths. 
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